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Abstract. Navigational graph queries are an important class of queries that can 
extract implicit binary relations over the nodes of input graphs. Most of the nav¬ 
igational query languages used in the RDF community, e.g. property paths in 
W3C SPARQL 1.1 and nested regular expressions in nSPARQL, are based on 
the regular expressions. It is known that regular expressions have limited expres¬ 
sivity; for instance, some natural queries, like same generation-queries, are not 
expressible with regular expressions. To overcome this limitation, in this paper, 
we present cfSPARQL, an extension of SPARQL query language equipped with 
context-free grammars. The cfSPARQL language is strictly more expressive than 
property paths and nested expressions. The additional expressivity can be used 
for modelling graph similarities, graph summarization and ontology alignment. 
Despite the increasing expressivity, we show that cfSPARQL still enjoys a low 
computational complexity and can be evaluated efficiently. 


1 Introduction 

The Resource Description Framework (RDF) ll^ recommended by World Wide Web 
Consortium (W3C) is a standard graph-oriented model for data interchange on the Web 
RDF has a broad range of applications in the semantic web, social network, bio¬ 
informatics, geographical data, etc Cl. Typical access to graph-structured data is its 
navigational nature 016I21I12I . Navigational queries on graph databases return binary 
relations over the nodes of the graph ||9l- Many existing navigational query languages 
for graphs are based on binary relational algebra such as XPath (a standard navigational 
query language for trees 1251 ) or regular expressions such as RPQ (regular path queries) 

m. 

SPARQL l32l recommended by W3C has become the standard language for query¬ 
ing RDF data since 2008 by inheriting classical relational languages such as SQL. 
However, SPARQL only provides limited navigational functionalities for RDF I28I37I . 
Recently, there are several proposed languages with navigational capabilities for query¬ 
ing RDF graphs 126119171281513141111351 . Roughly, Versa l2^ is the first language for 
RDF with navigational capabilities by using XPath over the XML serialization of RDF 
graphs. SPARQLeR proposed by Kochut et al. HD extends SPARQL by allowing path 
variables. SPARQL2L proposed by Anyanwu et al. ijT) allows path variables in graph 
patterns and offers good features in nodes and edges such as constraints. PSPARQL 
proposed by Alkhateeb et al. El extends SPARQL by allowing regular expressions in 
general triple patterns with possibly blank nodes and CASPAR further proposed by 








Alkhateeb et al. EH allows constraints over regular expressions in PSPARQL where 
variables are allowed in regular expressions. nSPARQL proposed by Perez et al. 1281 
extends SPARQL by allowing nested regular expressions in triple patterns. Indeed, 
nSPARQL is still expressible in SPARQL if the transitive closure relation is absent 
im . In March 2013, SPARQL 1.1 13^ recommended by W3C allows property paths 
which strengthen the navigational capabilities of SPARQLl. ounna. 

However, those regular expression-based extensions of SPARQL are still limited 
in representing some more expressive navigational queries which are not expressed 
in regular expressions. Let us consider a fictional biomedical ontology mentioned in 
ISTl (see Figure [T]). We are interested in a navigational query about those paths that 
confer similarity (e.g., between Gene(B) and Gene(C)), which suggests a causal rela¬ 
tionship (e.g., between Gene(S) and Phenotype(T)). This query about similarity arises 
from the well-known same generation-query Gl, which is proven to be inexpressible 
in any regular expression. To express the query, we have to introduce a query em¬ 
bedded with a context-free grammar (CFG) for expressing the language of {ww'^ \ 
w is a strins} ^yi] where w'^ is the converse of w. For instance, if w = “abcdfe” then 
= “e~^ As we know, CFG has more expressive power than any 

regular expression M- Moreover, the context-free grammars can provide a simplified 
more user-friendly dialect of Datalog |[T] which still allows powerful recursion ifTSl . Be¬ 
sides, the context-free graph queries have also practical query evaluation strategies. For 
instance, there are some applications in verification jjo). So it is interesting to introduce 
a navigational query embedded with context-free grammars to express more practical 
queries like the same generation-query. 

A proposal of conjunctive context-free path queries (written by Helling’s CCFPQ) 
for edge-labeled directed graphs has been presented by Helling M by allowing context- 
free grammars in path queries. A naive idea to express same generation-queries is trans¬ 
forming this RDF graph to an edge-labeled directed graph via navigation axes ll28l and 
then using Helling’s CCFPQ since an RDF graph can be intuitively taken as an edge- 
labeled directed graph. However, this transformation is difficult to capture the full in¬ 
formation of this RDF graph since there exist some slight differences between RDF 
graphs and edge-labeled directed graphs, particularly regarding the connectivity uni, 
thus it could not express some regular expression-based path queries on RDF graphs. 
For instance, a nested regular expression (nre) of the form axis :: [e] on RDF graphs 
in nSPARQL ll2^ . is always evaluated to the empty set over any edge-labeled directed 
graph. That is to say, an nre of the form “axis :: [e]” is hardly expressible in Helling’s 
CCFPQ. 

To represent more expressive queries with efficient query evaluation is a renewed 
interest topic in the classical topic of graph databases El. Hence, in this paper, we 
present a context-free extension of path queries and SPARQL queries on RDF graphs 
which can express both nre and nSPARQL ll2^ . Furthermore, we study several funda¬ 
mental properties of the proposed context-free path queries and context-free SPARQL 
queries. The main contributions of this paper can be summarized as follows: 

- We present context-free path queries (CFPQ) (including conjunctive context-free 
path queries (CCFPQ), union of simple conjunctive context-free path queries 
(UCCFPQ®), and union of conjunctive context-free path queries (UCCFPQ) for 





Fig. 1: A biomedical ontology OTIl 

RDF graphs and find that CFPQ, CCFPQ, and UCCFPQ have efficient query eval¬ 
uation where the query evaluation has the polynomial data complexity and the NP- 
complete combined complexity. Finally, we implement our CFPQs and evaluate 
experiments on some popular ontologies. 

- We discuss the expressiveness of CFPQs by referring to nested regular expressions 
(nre). We show that CFPQ, CCFPQ, UCCFPQ®, and UCCFPQ exactly express 
four fragments of nre, basic nre “nreo”, union-free nre “nreo(N)”, nesting-free nre 
“nreo(l)”, and full nre, respectively (see Figure |2l. The query evaluation of cfS- 
PARQL has the same complexity as SPARQL. 

- We propose context-free SPARQL (cfSPARQL) and union of conjunctive context- 
free SPARQL (uccfSPARQL) based on CFPQ and UCCFPQ, respectively. It shows 
that cfSPARQL has the same expressiveness as that of uccfSPARQL. Furthermore, 
we prove that cfSPARQL can strictly express both SPARQL and nSPARQL (even 
nSPARQL^: a variant of nSPARQL by allowing nre with negation “nre^) (see Fig¬ 
ure O. 

Organization of the paper Section |2] recalls nSPARQL and context-free grammar. 
Section [ 3 ] defines CFPQ. Section |4] discusses the expressiveness of CFPQ. Section |5] 
presents cfSPARQL and Section |6] discusses the relations on nre with negation. Sec- 
tion|7]evaluates experiments. We conclude in Section^ Due to the space limitation, all 
proofs and some further preliminaries are omitted but they are available in an extended 
technical report in arXiv.org I^ . 

2 Preliminaries 

In this section, we introduce the language nSPARQL and context-free grammar. 

2.1 The syntax and semantics of nSPARQL 

In this subsection, we recall the syntax and semantics of nSPARQL, largely following 
the excellent expositions 0281271 . 




























RDF graphs An RDF statement is a subject-predicate-object structure, called RDF 
triple which represents resources and the properties of those resources. For the sake of 
simplicity similar to ll28l . we assume that RDF data is composed only IRlfl Formally, 
let U be an inhnite set of IRIs. A triple {s,p,o) € U x U x U is called an RDF triple. 
An RDF graph G is a finite set of RDF triples. We use adom{G) to denote the active 
domain of G, i.e., the set of all elements from U occurring in G. 

For instance, a biomedical ontology shown in Figure [T]can be modeled in an RDF 
graph named as Gbio where each labeled-edge of the form a ^ bis directly translated 
into a ti'iple {a,p, b). 

Paths and traces Let G be an RDF graph. A path tt = (ciC 2 ... Cm) in G is a non¬ 
empty hnite sequence of constants from G, where, for every i € — 1}, 

Ci and Ci+i exactly occur in the same triple of G (i.e., (ci,c, q+i), (ci,Ci+i,c), and 
(c, Ci, Ci+i) etc.). Note that the precedence between Ci and Ci+i in a path is independent 
of the positions of Ci, Ci+i in a triple. 

In nSPARQL, three different navigation axes, namely, next, edge, and node, and 
their inverses, i.e., next~^, edge~^, and node~^, are introduced to move through an 
RDF triple {s,p,o) 1281. 

Let E = {axis, axis :: c | c S U} where axis € [self, next, edge, node, next~^, 
edge~^, node~^{. Let G be an RDF graph. We use E{G) to denote the set of all sym¬ 
bols [axis, axis :: c | c € adom{G)} occurring in G. 

Let TT = (ci... Cm) be a path in G. A trace of path tt is a string over E{G) written 
by T{tt) = li... Im-i where, for all i € {1,..., m — 1}, (ciCi+i) is labeled by li and 
li is of the form axis, axis :: c, axis~^, or axis~^ :: c JSS). We use Trace{'K) to denote 
the set of all traces of tt. 

Note that it is possible that a path has multiple traces since any two nodes possibly 
occur in the multiple triples. For example, consider an RDF graph G = {{a,b, c), (a, c, &)} 
and given a path tt = (abc), both {edge :: c){node :: a) and [next :: c){node~^ :: a) 
are traces of tt. 

For instance, in the RDF graph Gbio (see Figure[TJ, a path from Gene(B) to Gene(C) 
has a trace; {next :: locatedIn){next~^ :: linkedTo){next :: linkedTo){next~^ :: 
locatedin). 

Nested regular expressions Nested regular expressions (nre) are defined by the follow¬ 
ing formal syntax; 

e ;= axis \ axis :: c{c€U) \ axis :: [e] | e/e | e|e | e*. 

Flere the nesting nre is of the form axis :: [e]. 

For simplification, we denote some interesting fragments of nre as follows; 

- nreg; basic nre, i.e., nre only consisting of “axis”, “/”, and 

- nreo(l); basic nre by adding the operator 

- nreo(N) to basic nre by adding nesting nre axis :: [e]. 

' A standard RDF data is composed of IRIs, blank nodes, and literals. For the purposes of this 
paper, the distinction between IRIs and literals will not be important. 



Patterns Assume an infinite set V of variables, disjoint from U. A nested regular 
expression triple (or nre-triple) is a tuple of the form {lx, e, ly), where lx, ly £V and 
e is an nr^l 

Formally, nSPARQL (graph) patterns are recursively constructed from nre-triples: 

- An nre-triple is an nSPARQL pattern; 

- All Pi UNION P 2 , Pi AND P 2 , and Pi OPT P 2 are nSPARQL patterns if Pi 
and P 2 are nSPARQL patterns; 

- P FILTER C if P is an nSPARQL pattern and C is a constraint; 

- SELECTs(P) if P is an nSPARQL pattern and S' is a set of variables. 

Semantics Given an RDF graph G and an nre e, the evaluation of e on G, denoted 
by [ejc, is a binary relation. More details can be found in ||28l. Here, we recall the 
semantics of nesting nre of the form axis :: [e] as follows: 

\axis :: [ejjc = {{a,b) \ 3 c, d G adom{G), {a,b) € laxis :: c]|g and {c,d) G lejc}- 

The semantics of nSPARQL patterns is dehned in terms of sets of so-called map¬ 
pings, which are simply total functions p,: S ^ U on some finite set S of variables. We 
denote the domain S of p, by dom(p). 

Basically, the semantics of an nre-triple {u, e, v) is dehned as follows: 

I(u, e, u)]|g = {p: {u,v}nV ^U\ (p(u), p{v)) G lelc}. 

Here, for any mapping p and any constant c G U, we agree that p(c) equals c itself. 

Let P be an nSPARQL pattern, the semantics of P on G, denoted by |P]gj N 
analogously dehned as usual following the semantics of SPARQL M28I27I . 

Query evaluation A SPARQL (SELECT) query is an nSPARQL pattern. Given a RDF 
graph G, a pattern P, and a mapping p, the query evaluation problem of nSPARQL is 
to decide whether p is in |P]g- The complexity of query evaluation problem is PSpace- 
complete lIZTl . 

2.2 Context-free grammar 

In this subsection, we recall context-free grammar. For more details, we refer the inter¬ 
ested readers to some references about formal languages ESI. 

A context-free grammar (COG) is a 3-tuple Q = {N, A, where 

- W is a hnite set of variables (called non-terminals)', 

- A is a hnite set of constants (called terminals)', 

- i? is a hnite set of production rules r of the form v ^ S, where v G N and 
S G (A^U A)* (the asterisk * represents the Kleene star operation). We write v —>■ e 
if e is the empty string. 

^ In nSPARQL 1281 . nre-triples allow a general form (v, e, u) where u,v G UUV.ln this paper, 
we mainly consider the case u,v G V to simplify our discussion. 

^ We deviate from the usual dehnition of context-free grammar by not including a special start 
non-terminal following ca. 





A string over NUA can be written to a new string over NuA by applying production 
rules. Consider a string avb and a production rule r : v ^ avb, we can obtain a 
new string aavbb by applying this rule r one time and another new string aaavbbb by 
applying the rule r twice. Analogously, strings with increasing length can be obtained 
in this rule. 

Let S,T G {N U A)*. We write {S T) if T can be obtained from S by applying 
production rules of Q within a finite number of times. 

The language of grammar Q = {N, A, R) w.r.t. start non-terminal v G N is dehned 
by C{Qy) = {5 a hnite string over A\v ^ S'}. 

For example, Q = {N, A, R) where N = {u}, A = {a, b}, and R = {v ^ ab,v ^ 
avb}. Thus C{Gv) = | n > 1}. 

3 Context-free path queries 

In this section, we introduce context-free path queries on RDF graphs based on context- 
free path queries on directed graphs iflTll and nested regular expressions ||28]| . 

3.1 Context-free path queries and their extensions 

In this subsection, we firstly dehne conjunctive context-free path queries on RDF graphs 
and then present some variants (it also can been seen as extensions). 

Conjunctive context-free path queries In this paper, we assume that N fiV = % and 
A C r for all CFG G = {N, A, R). 

Definition 1. Let G = {N, A, R) be a CFG and m a positive integer. A conjunctive 
context-free path query (CCFPQ) is of the form q(?a:, ?yj3 where, 

m 

q{7x,7y) :=/\ a^, (1) 

i=l 


where 

- (Xi is a triple pattern either of the form (lx, ly, Iz) or of the form v(7x, 7y); 

- {7x, 7y} C mrs(q) where r;ars(q) denotes a collection of all variables occurring 
in the body o/q; 

- {Vi, . . .,Vm} Q N. 

We regard the name of query q(?a;, 7y) as q and call the right of Equation ([7} as the 
body o/q. 

Remark 1. In our CCFPQ, we allow a triple pattern of the form {7x, 7y, 7z) to charac¬ 
terize those queries w.r.t. ternary relationships such as nre-triple patterns of nSPARQL 
If28ll to be discussed in Section|4] The formula v{7x, 7y) is used to capture context-free 
path queries iflAll . 

In this paper, we simply write a conjunctive query as a Datalog rule. 



We say a simple conjunctive context-free path query (written by CCFPQ^) if only 
the form v(J!x, ly) is allowed in the body of a CCFPQ. We also say a context-free path 
query (written by CFPQ) if m = 1 in the body of a CCFPQ'*. 

Semantically, let Q — {N, A, R) be a CFG and G an RDF graph, given a CCFPQ 
q(?x, ly) of the form ([Til, |q(?a;, ?2/)1 g is dehned as follows: 

{fi|{?x,?y} I dom(/r) = wrs (q) and V* = I,... p.\yavs{ai) € [ailc}, (2) 
where the semantics of v{lx^ ly) over G is dehned as follows: 

|t;(?a:, ?y)|G = [p \ dom(/r) = {lx, ly} and 

dTT = {y{lx)ci .. .Cmy(ly)) a path in G, Trace{Tr) Ci C{Gv) 0}. 

Intuitively, \v{lx, ?2 /)]g returns all pairs connected by a path in G which contains 
a trace belonging to the language generated from this CFG starting at non-terminal v. 

Example 1. Let Q = {N, A, R) be a CFG where N = {u, u}, A = {next :: locatedin, 
next~^ :: locatedin, next :: linkedTo, next~^ :: linkedTo}, and P = {u —>■ {next :: 
locatedin) u {next~^ :: locatedin), u —^ {next~^ :: linkedTo) u {next :: linkedTo), u —>■ 
e}. Consider a CFPQ qbe of the formz;(?a:, ly). The query q represents the relationship 
of similarity (between two genes) since C{Qy) = {{next~^ :: locatedin)'^{next~^ :: 
linkedTo){next :: linkedTo){next :: locatedin)'^ \ n > 1}. Consider the RDF graph 
Gbio in Figured] Iq(?a:, ?2/)]Gbio = {{lx = Gene(B),ly = Gene(C))}. Clearly, the 
query q returns all pairs with similarity. 

Query evaluation Let Q = {N, A, R) be a CFG and G an RDF graph. Given a CCFPQ 
q(?x, ly) and a tuple fi = {lx = a,ly = b), the query evaluation problem is to decide 
whether p, G |q(?a:, ?2 /)]g, that is, whether the tuple p is in the result of the query q 
on the RDF graph G. There are two kinds of computational complexity in the query 
evaluation problem CEl: 

- the data complexity refers to the complexity w.r.t. the size of the RDF graph G, 

given a hxed query q; and 

- the combined complexity refers to the complexity w.r.t. the size of query q and the 

RDF graph G. 

A CFG Q = {N, A, R) is said to be in norm form if all of its production rules are of 
the form v -G uw, u —>• a, or u —>■ e where v,u,w G N and a G A. Note that this norm 
form deviates from the usual Chomsky Normal Form ll^ where the start non-terminals 
are absent. Indeed, every CFG is equivalent to a CFG in norm form, that is, for every 
CFG G, there exists some CFG Q' in norm form constructed from G in polynominal 
time such that C{Gv) = for every v G N lfT4l . 

Let G be an RDF graph and G = {N, A, R) a CFG. Given a non-terminal v G N, 
let TZi,{G) be the context-free relation of G w.r.t. v can be dehned as follows: 

TZv{G) := {(a, 6) I d-TT = (aci... Cmb) a path in G, Trace{Tr) n C{Gv) ^ 0}- (3) 

Conveniently, the query evaluation of CCFPQ over an RDF graph can be reduced 
into the conjunctive hrst-order query over the context-free relations. Based on the con- 


junctive context-free recognizer for graphs presented in lfT4ll . we directly obtain a con¬ 
junctive context-free recognizer (see Algorithm [Til for RDF graphs by adding a con¬ 
vertor to transform an RDF graph into an edge-labeled directed graph (see Algorithm 

Ell. 

Algorithm 1 Conjunctive context-free recognizer for RDF 
Input: G: an RDF graph; Q — {N, A,R): a CFG in norm form; v € N. 

Output: {{v,a,b) \ {a,b) g7?.„(G)} 

1:0:= {(v, a, a) | (a G adom{G)) A (v —> e G P)} 

2: U{(w, a, b) \ ((a, I, b) G Convertor{G)) A {v ^ 1) £ P} 

3 : Onew ~ 0 

4: while 0ne™ 0 do 

5: pick and remove a {v, a, b) from &new 

6: for all (u, a' , a) G 0 do 

7: for all v' ^ uv £ R A {{v', a', b) ^ 0) do 

8: Onew ■■= Onew U {{v', o', b)} 

9: O ■-OU{{v',a',b)} 

10: end for 

11: end for 

12: for all (m, b, b') G 0 do 

13: for all u' ^ vu £ R A ((u', a, b') ^ 0) do 

14: Onew :— Onew U '((u ^ 0>,b ) j 

15: 0 := 0 U {(«', a, 6')} 

16: end for 

17: end for 

18: end while 
19: return 0 


Given a path tt and a context-free grammar Q, Algorithm [T] is sound and complete 
to decide whether the path tt in RDF graphs has a trace generated from the grammar Q. 


Proposition 1. Let G be an RDF graph and Q = {N, A, R) a CFG in norm form. For 
every v £ N, let O be the result computed in Algorithm\I\ (v, a,b) £ O if and only if 
(a, b) £ TZy{G). 

Moreover, we can easily observe the worst-case complexity of Algorithm [T] since 
the complexity of Algorithm|2]is 0(\G\). 

Proposition 2. Let G be an RDF graph and Q = {N, A, R) a CFG. Algorithm\I\ap- 
plied to G and G has a worst-case complexity o/0((|A^||G|)^). 

As a result, we can conclude the following proposition. 

Proposition 3. The followings hold: 

1. The query evaluation ofCCFPQ has polynomial data complexity; 

2. The query evaluation ofCCFPQ has NP-complete combined complexity. 






Algorithm 2 RDF convertor 

Input: G: an RDF graph 




Output: Convertor {G) — 

(v,f) 



1 

V := adom{G) 




2 

£ := {(c, self, c), (c, self :: c, c) | c G adom{G)} 



3 

Gnew • — G 




4 

while Gnew 




5 

pick and remove a triple (s, p, o) from Gnew 



6 

£ :— £ U {(s, next : 

■. p,o),{s, next, o), {o, next ^ 

p,s), (o, next 



(s, edge :: 

o,p), {s, edge,p), {p, edge~^ 

:o,s), {p, edge-^ 

,s), 


{p, node :: 

s, o), {p, node, o), (o, node~^ 

:: s,p), (o, node~ 

^P)} 

7 

end while 




8 

return Convertor ( G) 





Union of CCFPQ An extension of CCFPQ capturing more expressive power such as 
disjunctive capability is introducing the union of CCFPQ. For instance, given a gene 
(e.g., Gene(B)) in the biomedical ontology (see Figure [Til, we wonder to find those 
genes which are relevant to this gene, that is, those genes either are similar to it (e.g., 
Gene(C)) or belong to the same pathway (e.g., Gene(S)). 

A union of conjunctive context-free path query (UCCFPQ) is of the form 

m 

q{7x,7y) :=\/ q,{7x,7y), (4) 

i=l 

where qi(?a;, 7y) is a CCFPQ for alH = 1,..., m. 

Analogously, we can define union of simple conjunctive context-free path query 
written by UCCFPQ^. 

Semantically, let G be an RDF graph, we define 

m 

|q(?a;, ?j/)]|g = IJ [q*(?x, ?2 /)|g, (5) 

i=l 

where |qi(?a;, ?y)]G is defined as the semantics of CCFPQ for alH = 1,..., m. 

In Example [T] based on C/ = {N,A,R), we construct a CFG G' = {N',A',R') 

where N' = N U {s}, A = A L) {next :: belongsTo, next~^ :: belongsTo}, and 

R' — i? U {s —>■ {next :: belongsTo)s{next~^ :: belongsTo)}. Consider a UC¬ 
CFPQ q(?a;, ?y) := v{7x,7y) V s{7x,7y), |q(?a;, ?y)I|Gbio = = Gene(B),7y = 

Gene(C)), {7x = Gene(B},7y = Gene(S))}. That is, |q(?a;, ?j/)]Gbio returns all pairs 
where the first gene is relevant to the latter. 

Note that the query evaluation of UCCFPQ has the same complexity as that of the 
evaluating of CCFPQ since we can simply evaluate a number (linear in the size of a 
UCCFPQ) of CCFPQs in isolation 

4 Expressivity of (U)(C)CFPQ 

In this section, we investigate the expressivity of (U)(C)CFPQ by referring to nested 
regular expressions and fragments of nre. 






We discuss the relations between variants of UCCFPQ and variants of (nested) reg¬ 
ular expressions and obtain the following results: 

1. nreg-triples can be expressed in CFPQ; 

2. nreo(N)-triples can be expressed in CCFPQ; 

3. nreo(|)-triples can be expressed in UCCFPQ®; 

4. nre-triples can be expressed in UCCFPQ. 

1. nrco in CFPQ The following proposition shows that CFPQ can express nrep-triples. 

Proposition 4. For every nreg-triple (?x, e, ly), there exist some CFG Q = (N, A, R) 
and some CFPQ q(?a:, 7y) such that for every RDF graph G, we have |(?a:, e, ?y)]G = 

|q(?x, ?y)|G. 

2. nreo(N) in CCFPQ Let ty be a CFG. A CCFPQ q(?a:, ly) is in nested norm form 
if the following holds: 

q(?a:, ly) := {{lx\ly', Iz') A vif.x, ly)) A qi(?M, Iw), (6) 

where 

- ?2/} n {lx', ly', Iz'} f 0; 

- {lx', ly', Iz'j n {lu, Iw} f 0; 

- (iiilu,lw) is a CCFPQ. 

Note that {lx',ly',lz') is used to express a nested nre of the form axis :: [e] and 
v{lx, ly) is necessary to express a nested nre of the form self :: [e]. 

The following proposition shows that CCFPQ can express nreo(N)-triples. 

Proposition 5. For every nreo(N)-triple {lx, e, ly), there exist a CFG Q = {N, A, R) 
and a CCFPQ ailx, Iv) in nested norm form ® such that for every RDF graph G, we 
have |(?x,e,?y)|G = |q(?a:, ?t/)|G. 

3. nreo(l) in UCCFPQ® Let e be an nre. We say e is in union norm form if e is of the 
following form ei\e 2 \ ■ ■ ■ \em where Ci is an nreo(N) for alH = 1,..., m. 

We can conclude that each nre-triple is equivalent to an nre in union norm form. 

Proposition 6. For every nre-triple {lx, e, ly), there exists some e' in union norm form 
such that |(?x, e, ?2 /)|g = e', ?2/)|g for every RDF graph G. 

The following proposition shows that UCCFPQ® can express nreo(|). 

Proposition 7. ForeverynreQ{\)-triple {lx, e,ly), there exists some CFG Q = {N,A,R) 
and some UCCFPQ" q(?x, ly) in nested norm form such that for every RDF graph G, 
we have {{lx, e, ?y)]G = [q(?a:, ?j/)1g- 

4. nre in UCCFPQ By Proposition |5] and Proposition |7] we can conclude that 

Proposition 8. For every nre-triple {lx, e, ly), there exists some CFG Q = {N, A, R) 
and some UCCFPQ q(?a:, ly) in nested norm form such that for every RDF graph G, 
we have l{lx,e,ly)lG = |q(?a;, ?j/)]|g. 

However, those results above in this subsection are not vice versa since the context- 
free language is not expressible in any nre. 

Proposition 9. CFPQ is not expressible in any nre. 


5 Context-free SPARQL 


In this section, we introduce an extension language context-free SPARQL (for short, 
cfSPARQL) of SPARQL by using context-free triple patterns, plus SPARQL basic op¬ 
erators UNION, AND, OPT, FILTER, and SELECT and its expressiveness. 

A context-free triple pattern (cftp) is of the form {lx, q, ly) where q(?a;, ly) is a 
CFPQ. Analogously, we can define union of conjunctive context-free triple pattern (for 
short, uccftp) by using UCCFPQ. 

cfSPARQL and query evaluation Formally, cfSPARQL (graph) patterns are then re¬ 
cursively constructed from context-free triple patterns: 

- A cftp is a cfSPARQL pattern; 

- A triple pattern of the form {lx, ly, Iz) is a cfSPARQL pattern; 

- All Pi UNION P 2 , Pi AND P 2 , and Pi OPT P 2 are cfSPARQL patterns if Pi, P 2 
are cfSPARQL patterns; 

- P FILTER C if P is a cfSPARQL pattern and C is a contraint; 

- SELECT 5 (P) if P is a cfSPARQL pattern and S' is a set of variables. 

Remark 2. In cfSPARQL, we allow triple patterns of form {lx,ly,lz) (see Item 2), 
which can express any SPARQL triple pattern together with FILTER lIMl . to ensure 
that SPARQL is still expressible in cfSPARQL while SPARQL is not expressible in 
nSPARQL since any triple pattern {lx,ly,lz) is not expressible in nSPARQL Il28l . 
Our generalization of nSPARQL inherits the power of queries without more cost and 
maintains the coherence between CFPQ and “nested” nre of the form axis :: [e]. More¬ 
over, this extension in cfSPARQL coincides with our proposed CCFPQ where triple 
patterns of the form {lx, ly, Iz) are allowed. 

Semantically, let P be a cfSPARQL pattern and G an RDF graph, |(?x, q, ?t/)|G is 
defined as |q(?a:, ?t/)|G other expressive cfSPARQL patterns are defined as normal 
1281271 . 

Proposition 10. SPARQL is expressible in cfSPARQL but not vice versa. 

A cfSPARQL query is a pattern. 

We can define union of conjunctive context-free SPARQL query (for short, uccfS- 
PARQL) by using uccftp in the analogous way. 

At the end of this subsection, we discuss the complexity of evaluation problem of 
uccfSPARQL queries. 

For a given RDF graph G, a uccftp P, and a mapping p, the query evaluation prob¬ 
lem is to decide whether p is in |P|g- 

Proposition 11. The evaluation problem of uccfSPARQL queries has the same com¬ 
plexity as the evaluation problem of SPARQL queries. 

As a direct result of Proposition [8] we can conclude 

Corollary 1. nSPARQL is expressible in uccfSPARQL but not vice versa. 




On the expressiveness of cfSPARQL In this subsection, we show that cfSPARQL has 
the same expressiveness as uccfSPARQL. In other words, cfSPARQL is enough to ex¬ 
press UCCFPQ on RDF graphs. 

Since every cfSPARQL pattern is a uccfSPARQL pattern, we merely show that 
uccfSPARQL is expressible in cfSPARQL. 

Proposition 12. For every uccfSPARQL pattern P, there exists some cfSPARQL pattern 
Q such that |P]g = \Q\g for any RDF graph G. 

6 Relations on (nested) regular expressions with negation 

In this section, we discuss both the relation between UCCFPQ and nested regular ex¬ 
pressions with negation and the relation between cfSPARQL and variants of nSPARQL. 

Nested regular expressions with negation A nested regular expression with negation 
(nre~') is an extension of nre by adding two new operators “difference (ei — 62 )” and 
“negation (e°)” llJTl . 

Semantically, let e, ei, 62 be three nre^s and G an RDF graph, 

- [ei - e 2 lG = {(a, b) G |ei|G | (a, b) (f |e 2 lG}; 

“ [c'^Ig = G adom{G) x adom{G) \ {a,b) ^ |e|G}- 

Analogously, an nre^-triple pattern is of the form {lx, e, ly) where e is an nre^. 
Clearly, nre^-triple pattern is non-monotone. 

Since nre is monotone, nre is strictly subsumed in nre^ ED. Though property paths 
in SPARQL 1.1 0331291 are not expressible in nre since property paths allow the nega¬ 
tion of IRIs, property paths can be still expressible in the following subfragment of 
nre^: let c, ci,..., Cn+m G U, 

e := next :: c | e/e | self :: [e] | e* | e^ | next~^ :: [e] | 

{next :: ci| ... \next :: Cn\next~^ :: c„+i| ... \next~^ :: Cn+mY■ 
Note that e’*' can be expressible as the expression e* — self. 

Proposition 13. uccftp is not expressible in any nre^-triple pattern. 

Due to the non-monotonicity of nre^, we have that nre^ is beyond the expressive¬ 
ness of any union of conjunctive context-free triple patterns even the star-free nre^ (for 
short, sf-nre^) where the Kleene star (*) is not allowed in nre^. 

Proposition 14. sf-nre^-triple pattern is not expressible in any uccftp. 

In short, nre^-triple pattern and uccftp cannot express each other. Indeed, negation 
could make the evaluation problem hard even allowing a limited form of negation such 
as property paths 12 ^ . 

cfSPARQL can express nSPARQL~' Following nSPARQL, we can analogously con¬ 
struct the language nSPARQL^ which is built on nre^, by adding SPARQL operators 
UNION, AND, OPT, FILTER, and SELECT. 

Though uccftps cannot express nre“'-triple patterns by Proposition fT3l cfSPARQL 
can express nSPARQL^ since nSPARQL^ is still expressible in nSPARQL llJTl . 

Corollary 2. nSPARQLP is expressible in cfSPARQL. 




6.1 Overview 


Finally, Figure |2] and Figure [3 provide the implication of the results on RDF graphs 
for the general relations between variants of CFPQ and nre and the general relations 
between cfSPARQL and nSPARQL where £i —£2 denotes that L\ is expressible in 
£2 and £1 o £2 denotes that £1 —£2 and £2 — J- £1. Analogously, nSPARQL®^ is an 
extension of SPARQL by allowing star-free nre^-triple patterns. 


CfSPARQL uccfSPARQL 

f f 

SPARQL nSPARQL^ 

f f 

nreo nSPARQL^^-► nSPARQL 

Fig. 2: Known relations between variants Fig. 3: Known relations between variants 
of CFPQ and variants of nre. of cfSPARQL and variants of nSPARQL. 

7 Implementation and evaluation 

In this section, we have implemented the two algorithms for CFPQs without any opti¬ 
mization. Two context-free path queries over RDF graphs were evaluated and we found 
some results which cannot be captured by any regular expression-based path queries 
from RDF graphs. 

The experiments were performed under Windows 7 on a Intel 15-760,2.80GHz CPU 
system with 6GB memory. The program was written in Java 7 with maximum 2GB heap 
space allocated for JVM. Ten popular ontologies like foaf, wine, and pizza were used 
for testing. 

Query 1 Consider a CFG Si = (N,A,R) where N = {S'}, A — {next~^ :: 
subClassOf, next :: subClassOf, next~^ :: type, next :: type}, and R = {S ^ 
{next~^ :: subClassOf) S {next :: subClassOf), S —>■ {next~^ :: type) S {next :: 
type), S e}. The query Qi based on the grammar Qi can return all pairs of concepts 
or individuals at the same layer of the hierarchy of RDF graphs. Table[T]shows the ex¬ 
perimental results of Qi over the testing ontologies. Note that ^results denotes that 
number of pairs of concepts or individuals corresponding to Qi. 

Taking the ontology foaf, for example, the query Qi over foaf returns pairs of 
concepts like {foaf:Document, foaf:Person), which shows that the two concepts. Docu¬ 
ment and Person, are at the same layer of the hierarchy of foaf, where the top concept 
{owLThing) is at the first layer. 

Query 2 Similarly, consider a CFG Q 2 = {N,A,R) where N = (S', B}, A = 
{next~^ :: subClassOf next :: subClassOf}, and i? = {S ^ BS,B —>■ {next :: 
subClassOf) B {next~^ :: subClassOf), B —>■ B{next~^ :: subClassOf) B {next :: 


UCCFPQ nre^ 


CCFPQ nre UCCFPQ" 

IXXl 

nreo(N) CFPQ nreo(|) 




subClassOf){next~^ :: subClassOf), S —>■ e}. The query Q 2 based on the grammar Q 2 
can return all pairs of concepts which are at adjacent two layers of the hierarchy of RDF 
graphs. We also take the ontology/oa/, for example, the query Q 2 over/oa/ returns pairs 
of concepts like {foaf:Person oaf:PersonalProfileDocument), which denotes that Per¬ 
son is at higher layer than PersonalProfileDocument, since PersonalProfileDocument is 
a subclass of Document. Table [1] shows the experimental results of Q 2 over the testing 
ontologies. 


Table 1: The evaluation results of Qi and Q 2 


Ontology 

#triples 

Query 1 

Query 2 

time(ms) 

#results 

tinie(nis) 

#results 

protege 

41 

468 

509 

5 

0 

funding 

144 

499 

296 

125 

77 

skos 

254 

1044 

810 

16 

1 

foaf 

454 

5027 

1929 

1154 

324 

generation 

319 

6091 

2164 

13 

0 

univ-bench 

306 

20981 

2540 

532 

228 

travel 

327 

13971 

2499 

281 

151 

people-l-pets 

703 

82081 

9472 

247 

120 

biomedical-measure-primitive 

459 

420604 

15156 

1068851 

9178 

atom-primitive 

561 

515285 

15454 

4711499 

13940 

pizza 

1980 

3233587 

56195 

255853 

4694 

wine 

2012 

4075319 

66572 

273 

79 


8 Conclusions 

In this paper, we have proposed context-free path queries (including some variants) to 
navigate through an RDF graph and the context-free SPARQL query language for RDF 
built on context-free path queries by adding the standard SPARQL operators. Some 
investigation about some fundamental properties of those context-free path queries 
and their context-free SPARQL query languages has been presented. We proved that 
CFPQ, CCFPQ, UCCFPQ®, and UCCFPQ strictly express basic nested regular expres¬ 
sion (nreo), nreo(N), nreo(|), and nre, respectively. Moreover, uccfSPARQL has the 
same expressiveness as cfSPARQL; and both SPARQL and nSPARQL are express¬ 
ible in cfSPARQL. Furthermore, we looked at the relationship between context-free 
path queries and nested regular expressions with negation (which can express property 
paths in SPARQL 1.1) and the relationship between cfSPARQL queries and nSPARQL 
queries with negation (nSPARQL^). We found that neither CFPQ nor nre”’ can express 
each other while nSPARQL^ is still expressible in cfSPARQL. Finally, we discussed 






















the query evaluation problem of CFPQ and cfSPARQL on RDF graphs. The query eval¬ 
uation of UCCFPQ maintains the polynomial time data complexity and NP-complete 
combined complexity the same as conjunctive first-order queries and the query evalu¬ 
ation of cfSPARQL maintains the complexity as the same as SPARQL. These results 
provide a starting point for further research on expressiveness of navigational languages 
for RDF graphs and the relationships among regular path queries, nested regular path 
queries, and context-free path queries on RDF graphs. 

There are a number of practical open problems. In this paper, we restrict that RDF 
data does not contain blank nodes as the same treatment in nSPARQL. We have to 
admit that blank nodes do make RDF data more expressive since a blank node in RDF is 
taken as an existentially quantified variable El. An interesting future work is to extend 
our proposed (U)(C)CFPQ for general RDF data with blank nodes by allowing path 
variables which are already valid in some extensions of SPARQL such as SPARQ2L|7l, 
SPARQLeR lll9l , PSPARQL Q, and CPSPARQL II3I4I . which are popular in querying 
general RDF data with blank nodes. 
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